Titile

Assignment 11 uses the Maftools package Mayakonda et al. (2018) in R to demonstrate the data from the TCGA database, including ESCA, AML Stoll and Akbani (2013), and BRCA cancer projects. In the following sections, I will explain the code chunk individually and how the data hook up with the functions.

Part 6 Load Required input files

6-2 Reading MAF files

This Section has two sub-sections. The first one loads the Maftools package, which already has the TCGA data. We can extract the maf file in maf.gz and the annotation from .tsv file separately to create a MAF file object. By this, we can further handle through default functions such as getSampleSummary and getGeneSummary to quickly access specific parts of data for an overview.

## -Reading
## -Validating
## -Silent variants: 475 
## -Summarizing
## -Processing clinical data
## -Finished in 0.389s elapsed (0.364s cpu)

6-3 MAF object

Second section illustrates the functions that skim the MAF object, which has multiple datasets such as genes and sample data; the clinical data are TCGA samples annotating related cancer stages and survival information. The Frame Shift are nucleotides that change the reading frame, resulting in a non-functional protein; in contrast, the In Frame insertion and deletion do not alter the reading frame but can only affect the protein sequence. We can apply the “write.mafSummary” function to export an object. In the TCGA database use case, we can use GDCdownload and GDCprepare to set up our study database after selecting the necessary data for analysis.

## An object of class  MAF 
##                    ID          summary  Mean Median
##  1:        NCBI_Build               37    NA     NA
##  2:            Center genome.wustl.edu    NA     NA
##  3:           Samples              193    NA     NA
##  4:            nGenes             1241    NA     NA
##  5:   Frame_Shift_Del               52 0.269      0
##  6:   Frame_Shift_Ins               91 0.472      0
##  7:      In_Frame_Del               10 0.052      0
##  8:      In_Frame_Ins               42 0.218      0
##  9: Missense_Mutation             1342 6.953      7
## 10: Nonsense_Mutation              103 0.534      0
## 11:       Splice_Site               92 0.477      0
## 12:             total             1732 8.974      9
##    Tumor_Sample_Barcode Frame_Shift_Del Frame_Shift_Ins In_Frame_Del
## 1:         TCGA-AB-3009               0               5            0
## 2:         TCGA-AB-2807               1               0            1
## 3:         TCGA-AB-2959               0               0            0
## 4:         TCGA-AB-3002               0               0            0
## 5:         TCGA-AB-2849               0               1            0
## 6:         TCGA-AB-2923               1               1            0
##    In_Frame_Ins Missense_Mutation Nonsense_Mutation Splice_Site total
## 1:            1                25                 2           1    34
## 2:            0                16                 3           4    25
## 3:            0                22                 0           1    23
## 4:            0                15                 1           5    21
## 5:            0                16                 1           2    20
## 6:            0                15                 3           0    20
##    Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins
## 1:        FLT3               0               0            1           33
## 2:      DNMT3A               4               0            0            0
## 3:        NPM1               0              33            0            0
## 4:        IDH2               0               0            0            0
## 5:        IDH1               0               0            0            0
## 6:        TET2              10               4            0            0
##    Missense_Mutation Nonsense_Mutation Splice_Site total MutatedSamples
## 1:                15                 0           3    52             52
## 2:                39                 5           6    54             48
## 3:                 1                 0           0    34             33
## 4:                20                 0           0    20             20
## 5:                18                 0           0    18             18
## 6:                 4                 8           1    27             17
##    AlteredSamples
## 1:             52
## 2:             48
## 3:             33
## 4:             20
## 5:             18
## 6:             17
##    Tumor_Sample_Barcode FAB_classification days_to_last_followup
## 1:         TCGA-AB-2802                 M4                   365
## 2:         TCGA-AB-2803                 M3                   792
## 3:         TCGA-AB-2804                 M3                  2557
## 4:         TCGA-AB-2805                 M0                   577
## 5:         TCGA-AB-2806                 M1                   945
## 6:         TCGA-AB-2807                 M1                   181
##    Overall_Survival_Status
## 1:                       1
## 2:                       1
## 3:                       0
## 4:                       1
## 5:                       1
## 6:                       1
##  [1] "Hugo_Symbol"            "Entrez_Gene_Id"         "Center"                
##  [4] "NCBI_Build"             "Chromosome"             "Start_Position"        
##  [7] "End_Position"           "Strand"                 "Variant_Classification"
## [10] "Variant_Type"           "Reference_Allele"       "Tumor_Seq_Allele1"     
## [13] "Tumor_Seq_Allele2"      "Tumor_Sample_Barcode"   "Protein_Change"        
## [16] "i_TumorVAF_WU"          "i_transcript_name"

Part 7 Visualization

The maftools provides multiple visualization function that fits the cancer genomic datasets TCGA nicely. Here we make six subsections to discuss each plot function and its parameters individually.

7.1 Plotting MAF summary.

The first “plotmafSummary” takes MAF obj input and generates a dashboard that provides an overview of various quality control metrics and mutation signatures. The bol value of rmOutlier, dashboard, and titvRaw shows if you want to remove the outlier from data, show the dashboard summary or calculate the raw Ti/Tv ratio (transitions (Ti) to transversions (Tv) in a DNA sequence) to evaluate the quality of variant calling. Finally, if applicable, the addStat is an option to add statistical references, such as the median on the plot.

7.2 Oncoplots

The “oncoplots” select the top 10 genes with the most mutation samples and illustrate its tumor mutational burden on the top and annotation of explanation mutation type, for example, In Frame/Frame Shift/Nonsense on the location, and shows where to be more likely to respond to immune checkpoint inhibitors.

7.3 Transition and Transversions.

The “plotTiTv” first uses the titv function to compute the transition-to-transversion (Ti/Tv) ratio for somatic variants in a given MAF file, a measure of the types of substitutions that occur in a dataset of DNA mutations. Then, the plotTiTv plots the result accordingly.

7.4 Lollipop plots for amino acid changes

The Lollipop plots investigate a particular gene on its mutational location and frequency. Here we focus on the DNMT3A gene, showing its mutation rate. The “AACol” parameter can specify the mutation(s) location in the MAF file by Protein Change column. The “labelPos=882” is the y-axis position of the mutation label in the lollipop plot.

The “plotProtein” function inputs a gene name and reference sequence ID. The reference sequence ID retrieves the protein sequence from a database. It generates a plot that shows the location of exons, domains, and post-translational modifications on the protein sequence.

## 3 transcripts available. Use arguments refSeqID or proteinID to manually specify tx name.
##      HGNC refseq.ID protein.ID aa.length
## 1: DNMT3A NM_022552  NP_072046       912
## 2: DNMT3A NM_153759  NP_715640       723
## 3: DNMT3A NM_175629  NP_783328       912
## Using longer transcript NM_022552 for now.
## Removed 3 mutations for which AA position was not available

7.5 Rainfall plots

The rainfall plot shows the density of mutations across the genome. The x-axis is chromosomes, and the y-axis represents the density of mutations at that position. The “pointSize = 0.4” parameter sets the size of the points on the plot. The “detectChangePoints = TRUE” parameter in this example enables the detection of significant change points and highlights those regions with black arrows.

## Processing TCGA-A8-A08B..

7.6 Compare mutation load against TCGA cohorts

The tcgaCompare function inputs the laml MAF file object and specifies the cohort name “Example-LAML” to compare with others in the TCGA database. Thus, the x-axis is all the cancer cohort, the y-axis is the tumor mutational burden (TBM) per MB, and the y-axis The capture size =50 argument sets the threshold that includes only the capture size greater than 50 for the analysis. The “logscale = TRUE” argument specifies the TBM in a logarithmic scale.

In the “plotvaf” function, create a plot of the variant allele frequency (VAF) distribution for the LAML dataset. The vafCol = ‘i_TumorVAF_WU’ argument specifies which column in the MAF file to use for the VAF data. The x-axis shows the VAF values on genes, and the y-axis illustrates the density of mutations on a scale of 0 to 1. As we can see, the DNMT3A with a high VAF value may suggest clonal mutations.

## Warning in FUN(X[[i]], ...): Removed 1 samples with zero mutations.
## Capture size [TCGA]:  35.8
## Capture size [Input]: 50
## Performing pairwise t-test for differences in mutation burden (per MB)..

Part 9 Analysis

In the data analysis section, we will further explore the analysis funciton in the following subsections.

9.1 Somatic Interactions

First, the somaticInteractions generate a network plot of significant co-occurring and mutually exclusive gene mutations in a given MAF file. The green means highly correlated, and the asteroid sign indicates its p-value as the confidential interval of correlation assumption. On the other hand, The dark brown color represents the two independent gene mutations.

##      gene1  gene2       pValue oddsRatio  00 11 01 10        pAdj
##   1: ASXL1  RUNX1 0.0001541586 55.215541 176  4 12  1 0.003568486
##   2:  IDH2  RUNX1 0.0002809928  9.590877 164  7  9 13 0.006055880
##   3:  IDH2  ASXL1 0.0004030636 41.077327 172  4  1 16 0.008126283
##   4:  FLT3   NPM1 0.0009929836  3.763161 125 17 16 35 0.018664260
##   5:  SMC3 DNMT3A 0.0010451985 20.177713 144  6 42  1 0.018664260
##  ---                                                             
## 296: PLCE1  ASXL1 1.0000000000  0.000000 184  0  5  4 1.000000000
## 297: RAD21  FAM5C 1.0000000000  0.000000 183  0  5  5 1.000000000
## 298: PLCE1  FAM5C 1.0000000000  0.000000 184  0  5  4 1.000000000
## 299: PLCE1  RAD21 1.0000000000  0.000000 184  0  5  4 1.000000000
## 300:  EZH2  PLCE1 1.0000000000  0.000000 186  0  4  3 1.000000000
##                   Event         pair event_ratio
##   1:       Co_Occurence ASXL1, RUNX1        4/13
##   2:       Co_Occurence  IDH2, RUNX1        7/22
##   3:       Co_Occurence  ASXL1, IDH2        4/17
##   4:       Co_Occurence   FLT3, NPM1       17/51
##   5:       Co_Occurence DNMT3A, SMC3        6/43
##  ---                                            
## 296: Mutually_Exclusive ASXL1, PLCE1         0/9
## 297: Mutually_Exclusive FAM5C, RAD21        0/10
## 298: Mutually_Exclusive FAM5C, PLCE1         0/9
## 299: Mutually_Exclusive PLCE1, RAD21         0/9
## 300: Mutually_Exclusive  EZH2, PLCE1         0/7

9.2 Detecting cancer driver genes based on positional clustering

To detect the driver genes of cancer, we can use the “oncodrive” function that takes MAF file as input, and the AACol parameter specifies the mutation(s) location/gene in the MAF file by Protein Change. The “minMut” is the threshold to pick up the gene above specific mutation numbers to be considered a potential driver, and pvalMehtod is for calculating the statistical significance of the observed mutations in each gene.

The “plotOncodrive” function generates a volcano plot of the “oncodrive” results. The “fdrCutOff” sets the FDR method’s threshold for the adjusted p-value to avoid a false discovery rate (FDR) when determining significance. The “useFraction” decides whether to use the fraction of mutated samples in each cluster as the x-axis or the number of mutations. The “labelSize” controls the minimum size of the labels for each gene in the plot.

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%
##    Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins
## 1:        IDH1               0               0            0            0
## 2:        IDH2               0               0            0            0
## 3:        NPM1               0              33            0            0
## 4:        NRAS               0               0            0            0
## 5:       U2AF1               0               0            0            0
## 6:         KIT               1               1            0            1
##    Missense_Mutation Nonsense_Mutation Splice_Site total MutatedSamples
## 1:                18                 0           0    18             18
## 2:                20                 0           0    20             20
## 3:                 1                 0           0    34             33
## 4:                15                 0           0    15             15
## 5:                 8                 0           0     8              8
## 6:                 7                 0           0    10              8
##    AlteredSamples clusters muts_in_clusters clusterScores protLen   zscore
## 1:             18        1               18     1.0000000     414 5.546154
## 2:             20        2               20     1.0000000     452 5.546154
## 3:             33        2               32     0.9411765     294 5.093665
## 4:             15        2               15     0.9218951     189 4.945347
## 5:              8        1                7     0.8750000     240 4.584615
## 6:              8        2                9     0.8500000     976 4.392308
##            pval          fdr fract_muts_in_clusters
## 1: 1.460110e-08 1.022077e-07              1.0000000
## 2: 1.460110e-08 1.022077e-07              1.0000000
## 3: 1.756034e-07 8.194826e-07              0.9411765
## 4: 3.800413e-07 1.330144e-06              1.0000000
## 5: 2.274114e-06 6.367520e-06              0.8750000
## 6: 5.607691e-06 1.308461e-05              0.9000000

9.3 Adding and summarizing pfam domains

The pfamDomains function creates a pfam domain object that identifies the Pfam domains affected by mutations in the input MAF file. The AACol parameter specifies the column containing the protein change information, and the top parameter limits the number of top-ranked Pfam domains to display. The resulting object contains two data frames: proteinSummary, which summarizes the mutations per protein, and domainSummary, which summarizes the mutations per Pfam domain.

##      HGNC AAPos Variant_Classification  N total  fraction   DomainLabel
## 1: DNMT3A   882      Missense_Mutation 27    54 0.5000000 AdoMet_MTases
## 2:   IDH1   132      Missense_Mutation 18    18 1.0000000      PTZ00435
## 3:   IDH2   140      Missense_Mutation 17    20 0.8500000      PTZ00435
## 4:   FLT3   835      Missense_Mutation 14    52 0.2692308      PKc_like
## 5:   FLT3   599           In_Frame_Ins 10    52 0.1923077      PKc_like
## 6:  U2AF1    34      Missense_Mutation  7     8 0.8750000       zf-CCCH
##        DomainLabel nMuts nGenes
## 1:        PKc_like    55      5
## 2:        PTZ00435    38      2
## 3:   AdoMet_MTases    33      1
## 4:           7tm_1    24     24
## 5:         COG5048    17     17
## 6: Cadherin_repeat    16     16

9.4 Survival analysis

To perform survival analysis, we can use “mafSurvival” function. The function returns a data frame with survival information, such as the number of subjects at risk, the number of events, the survival probability, and the median survival time for each group. The function can also take multiple genes for grouping and perform survival analysis for each gene separately.

For predicting associated genes with survival, we can use the “survGroup” function takes a MAF file as input and identifies a set of genes associated with poor survival. The top parameter specifies the number of top mutated genes to consider, and the “geneSetSize” parameter sets the size of the gene set to predict.

The mafSurvGroup function is used to plot survival curves for multiple gene sets in one plot. It takes a MAF file object as input and the “geneSet” parameter specifies the gene set(s) to compare. The “time” parameter indicates the column in the MAF file containing the time variable, and the “Status” parameter indicates the column containing the status variable. The function returns a survival plot with a legend for each gene set. The x-axis represents the survival time, and the y-axis represents the probability of survival.

## DNMT3A 
##     48 
##     Group medianTime   N
## 1: Mutant        245  45
## 2:     WT        396 137

##     Gene_combination P_value    hr  WT Mutant
##  1:      FLT3_DNMT3A 0.00104 2.510 164     18
##  2:      DNMT3A_SMC3 0.04880 2.220 176      6
##  3:      DNMT3A_NPM1 0.07190 1.720 166     16
##  4:      DNMT3A_TET2 0.19600 1.780 176      6
##  5:        FLT3_TET2 0.20700 1.860 177      5
##  6:        NPM1_IDH1 0.21900 0.495 176      6
##  7:      DNMT3A_IDH1 0.29300 1.500 173      9
##  8:       IDH2_RUNX1 0.31800 1.580 176      6
##  9:        FLT3_NPM1 0.53600 1.210 165     17
## 10:      DNMT3A_IDH2 0.68000 0.747 178      4
## 11:      DNMT3A_NRAS 0.99200 0.986 178      4
##     Group medianTime   N
## 1: Mutant      242.5  18
## 2:     WT      379.5 164

9.5 Comparing two cohorts (MAFs)

In comparing two cohorts from different MAF files, we first read them as mentioned in the first section. Here use Primary APL and Relapse APL for the explanation. First, we use “mafCompare” function to compare two cohorts and save in “pt.vs.rt” data frame.

The “forestPlot” function generates a forest plot that displays each mutation’s effect size and confidence interval for comparing two groups. The mafCompareRes parameter takes the output of the “mafCompare” function, and pVal sets the threshold for significance.

Similar to the Oncoplot function , the “coOncoplot” function has a gene parameter that can assign an array of genes to compare among cohorts. As a result, it generates a co-occurring oncogene plot that compares the frequency of mutations in the given set of genes between two groups.

## -Reading
## -Validating
## --Non MAF specific values in Variant_Classification column:
##   ITD
## -Silent variants: 45 
## -Summarizing
## -Processing clinical data
## --Missing clinical data
## -Finished in 0.098s elapsed (0.092s cpu)
## -Reading
## -Validating
## --Non MAF specific values in Variant_Classification column:
##   ITD
## -Silent variants: 19 
## -Summarizing
## -Processing clinical data
## --Missing clinical data
## -Finished in 0.076s elapsed (0.073s cpu)
## $results
##    Hugo_Symbol Primary Relapse         pval         or       ci.up      ci.low
## 1:         PML       1      11 1.529935e-05 0.03537381   0.2552937 0.000806034
## 2:        RARA       0       7 2.574810e-04 0.00000000   0.3006159 0.000000000
## 3:       RUNX1       1       5 1.310500e-02 0.08740567   0.8076265 0.001813280
## 4:        FLT3      26       4 1.812779e-02 3.56086275  14.7701728 1.149009169
## 5:      ARID1B       5       8 2.758396e-02 0.26480490   0.9698686 0.064804160
## 6:         WT1      20      14 2.229087e-01 0.60619329   1.4223101 0.263440988
## 7:        KRAS       6       1 4.334067e-01 2.88486293 135.5393108 0.337679367
## 8:        NRAS      15       4 4.353567e-01 1.85209500   8.0373994 0.553883512
## 9:      ARID1A       7       4 7.457274e-01 0.80869223   3.9297309 0.195710173
##         adjPval
## 1: 0.0001376942
## 2: 0.0011586643
## 3: 0.0393149868
## 4: 0.0407875250
## 5: 0.0496511201
## 6: 0.3343630535
## 7: 0.4897762916
## 8: 0.4897762916
## 9: 0.7457273717
## 
## $SampleSummary
##     Cohort SampleSize
## 1: Primary        124
## 2: Relapse         58

##    HGNC refseq.ID protein.ID aa.length
## 1:  PML NM_002675  NP_002666       633
## 2:  PML NM_033238  NP_150241       882
## 3:  PML NM_033239  NP_150242       829
## 4:  PML NM_033240  NP_150243       611
## 5:  PML NM_033244  NP_150247       560
## 6:  PML NM_033246  NP_150249       423
## 7:  PML NM_033247  NP_150250       435
## 8:  PML NM_033249  NP_150252       585
## 9:  PML NM_033250  NP_150253       781
##    HGNC refseq.ID protein.ID aa.length
## 1:  PML NM_002675  NP_002666       633
## 2:  PML NM_033238  NP_150241       882
## 3:  PML NM_033239  NP_150242       829
## 4:  PML NM_033240  NP_150243       611
## 5:  PML NM_033244  NP_150247       560
## 6:  PML NM_033246  NP_150249       423
## 7:  PML NM_033247  NP_150250       435
## 8:  PML NM_033249  NP_150252       585
## 9:  PML NM_033250  NP_150253       781

9.6 Clinical enrichment analysis

In the enrichment analysis, the clinicalEnrichment function specified clinical features equal to “FAB_classification,” which divides leukemia into M0 to M7 stages and is used as annotation for the “laml” MAF file, saved as “fab.ce” object.

In the “fab.ce” object, we can access the groupwise_comparision data by a dollar sign and filter out data p-value equal to or higher than 5%. Lastly, we can visualize the result of enrichment by the plotEnrichmentResults function.

## 
## M0 M1 M2 M3 M4 M5 M6 M7 
## 19 44 44 21 39 19  3  3
##    Hugo_Symbol Group1 Group2 n_mutated_group1 n_mutated_group2      p_value
## 1:        IDH1     M1   Rest         11 of 44         7 of 149 0.0002597371
## 2:        TP53     M7   Rest           3 of 3        12 of 190 0.0003857187
## 3:      DNMT3A     M5   Rest         10 of 19        38 of 174 0.0089427384
## 4:       CEBPA     M2   Rest          7 of 44         6 of 149 0.0117352110
## 5:       RUNX1     M0   Rest          5 of 19        11 of 174 0.0117436825
## 6:        NPM1     M5   Rest          7 of 19        26 of 174 0.0248582372
## 7:        NPM1     M3   Rest          0 of 21        33 of 172 0.0278630823
## 8:      DNMT3A     M3   Rest          1 of 21        47 of 172 0.0294005111
##          OR      OR_low    OR_high       fdr
## 1: 6.670592 2.173829026 21.9607250 0.0308575
## 2:      Inf 5.355415451        Inf 0.0308575
## 3: 3.941207 1.333635173 11.8455979 0.3757978
## 4: 4.463237 1.204699322 17.1341278 0.3757978
## 5: 5.216902 1.243812880 19.4051505 0.3757978
## 6: 3.293201 1.001404899 10.1210509 0.5880102
## 7: 0.000000 0.000000000  0.8651972 0.5880102
## 8: 0.133827 0.003146708  0.8848897 0.5880102

9.7 Drug-Gene Interactions

The “drugInteractions” function is to identify potential drug-gene interactions based on mutations in genes that are known to be associated with drug response. It takes a MAF file as input and returns a data frame with potential drug-gene interactions, including the gene’s name, the type of mutation, the drugs associated with the gene, and the evidence supporting that interaction.

In the example, we search the interaction of the drug by specifying the gene “DNMT3A” and save into “dnmt3a.dgi” object. Then we can further print out the columns we are interested in, like Gene, interaction_types, and drug_name, as illustrated.

## Number of claimed drugs for given genes:
##      Gene N
## 1: DNMT3A 7
##      Gene interaction_types    drug_name drug_claim_name
## 1: DNMT3A                                            N/A
## 2: DNMT3A                   DAUNORUBICIN    Daunorubicin
## 3: DNMT3A                     DECITABINE      Decitabine
## 4: DNMT3A                     IDARUBICIN      IDARUBICIN
## 5: DNMT3A                     DECITABINE      DECITABINE
## 6: DNMT3A         inhibitor   DECITABINE   CHEMBL1201129
## 7: DNMT3A         inhibitor  AZACITIDINE      CHEMBL1489

9.8 Oncogenic Signaling Pathways

The OncogenicPathways function generates a summary table of the oncogenic pathways identified in the MAF file and the number of samples each pathway affects. The “PlotOncogenicPathways” function can be used to plot the genes involved in a specific oncogenic pathway, along with their mutation status in the samples. These functions are useful for exploring the oncogenic pathways commonly altered in a particular cancer type and understanding the potential targets for therapy.

##    Pathway  N n_affected_genes fraction_affected Mutated_samples
## 1:    PI3K 29                1        0.03448276               1
## 2:    NRF2  3                1        0.33333333               1
## 3:    TP53  6                2        0.33333333              15
## 4:     WNT 68                3        0.04411765               4
## 5:     MYC 13                3        0.23076923               3
## 6:   NOTCH 71                6        0.08450704               8
## 7:   Hippo 38                7        0.18421053               7
## 8: RTK-RAS 85               18        0.21176471              97
##    Fraction_mutated_samples
## 1:              0.005181347
## 2:              0.005181347
## 3:              0.077720207
## 4:              0.020725389
## 5:              0.015544041
## 6:              0.041450777
## 7:              0.036269430
## 8:              0.502590674

Reference

Mayakonda, Anand, De-Chen Lin, Yassen Assenov, Christoph Plass, and H Phillip Koeffler. 2018. “Maftools: Efficient and Comprehensive Analysis of Somatic Variants in Cancer.” Genome Res. 28 (11): 1747–56. https://doi.org/10.1101/gr.239244.118.
Stoll, Dominik, and Rehan Akbani. 2013. “Genomic and Epigenomic Landscapes of Adult de Novo Acute Myeloid Leukemia.” New England Journal of Medicine, May. https://doi.org/10.1056/NEJMoa1301689.